Name:Qingcheng Wei
Collaborated with:
This homework is due on Feb. 27st at 11:55 pm.
Instruction: fill your answers in the .Rmd, compile it to HTML/PDF and submit the complied file. Uncompiled .Rmd file will not be graded.
Remark. This homework aims to help you understand PCA and its applications.
Import the dataset from “nba-teams-2017.csv”. Create a new data.frame that contains the following columns:
teamwinspointspoints3free_throwsoff_reboundsdef_reboundsassistsstealspersonal_fouls
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data=read.csv("nba-teams-2017.csv")
col=data.frame(data[,c("team","wins","points","points3","free_throws","off_rebounds","def_rebounds","assists","steals","personal_fouls")])
boxplot(col %>% select(-team))
we should scale the data as the variance for wins is much larger than the others.
pca_bball <- prcomp(col %>% select(-team), scale = TRUE)
pca_bball$rotation[,1:4]
## PC1 PC2 PC3 PC4
## wins -0.42366308 0.07555818 -0.11681772 0.21657745
## points -0.50246947 -0.21708761 0.19677723 -0.07946391
## points3 -0.41664778 0.17268215 -0.08316881 -0.51204012
## free_throws -0.24452950 -0.41519726 0.30946852 -0.33272209
## off_rebounds 0.08111297 -0.39160091 0.47926467 0.46463787
## def_rebounds -0.26053718 0.26461371 0.57662166 0.15459073
## assists -0.45236958 0.05322237 -0.26482231 0.27220000
## steals -0.20525546 -0.41895791 -0.45168498 0.37698249
## personal_fouls 0.11583180 -0.58585494 -0.09277492 -0.34335891
screeplot(pca_bball, type = "l", npcs = 15, main = "Screeplot of the first 10 PCs")
abline(h = 1, col="red", lty=5)
legend("topright", legend=c("Eigenvalue = 1"),
col=c("red"), lty=5, cex=0.6)
Precisely: for each \(1\leq k \leq 10\) you are plotting:
\[\frac{\sum_{j=1}^k d_{j}^2}{\sum_{j=1}^{10} d_j^2}. \]
cumpro <- cumsum(pca_bball$sdev^2 / sum(pca_bball$sdev^2))
plot(cumpro[0:15], xlab = "PC #", ylab = "Amount of explained variance", main = "Cumulative variance plot")
abline(h = 0.9, col="blue", lty=5)
legend("topleft", legend=c("Cut-off @ PC6"),
col=c("blue"), lty=5, cex=0.6)
By looking at the plot, if we want to retain 90% of the variance, we would keep 5 PCs.
pca_scores <- pca_bball$x # PCs/ PC scores
low_dim_rep <- pca_scores %>%
data.frame() %>%
mutate(team = col$team) %>%
select(team, everything())
library(ggplot2)
ggplot(low_dim_rep, aes(x = PC1, y = PC2)) +
geom_vline(xintercept = 0) +
geom_hline(yintercept = 0) +
geom_text(aes(label = team), size = 2) +
coord_cartesian(xlim = c(-8, 5)) +
theme_linedraw()
The team that’s close with each other within the plot shows similarity in its data and the team that’s far away each other shows difference in its performance. In fact, for example, the Phoniex Suns has a larger variation with the rest of the team, wehreas GSW did extremly well in 2017 also sit away from the center of the plot.
Import the image from “Redfin_house.png”. Let \(X\) be the pixel intensity associated with the red color in the image.
Hints.
Review tutorial in “HM6/More PCA Examples” in class dropbox folder. Example 2 can be useful for this problem, e.g., how to load the png image data to R use R function readPNG.
See the Value section of ?png::readPNG to remind yourself of the organization of the raster array output.
library(png, quietly = TRUE)
library(grid, quietly = TRUE)
directory <- "Redfin_house.png"
img <- readPNG(directory)
img_plot <- as.raster(img)
grid.raster(img_plot)
img_red_only <- img[, , 1]
img.out <- prcomp(img_red_only, scale = TRUE)
dim(img.out$x)
## [1] 505 505
hist(img_red_only,
main = "The pixel intensities of Img",
col = "blue")
pr.var <- img.out$sdev^2
pve <- pr.var / sum(pr.var)
par(mfrow = c(1,2))
plot(pve, xlab = "Principal Component", ylab = "Proportion of Variance Explained")
plot(cumsum(pve), xlab = "Principal Component", ylab = "Cumulative Proportion of
Variance Explained", ylim = c(0, 1), type = "b", xlim = c(1, 50))
cum_explained1=cumsum((img.out$sdev)^2)/sum((img.out$sdev)^2)
cat("Num of PCs to explain at least 90% of variation is ", length(cum_explained1[cum_explained1<=0.9])+1)
## Num of PCs to explain at least 90% of variation is 24
cumsum(pve)
## [1] 0.2962951 0.4951405 0.5675534 0.6340435 0.6792437 0.7156087 0.7430687
## [8] 0.7665766 0.7861470 0.8038898 0.8202550 0.8311917 0.8407286 0.8497609
## [15] 0.8572818 0.8645586 0.8705091 0.8756694 0.8806781 0.8853024 0.8895810
## [22] 0.8934217 0.8970729 0.9006134 0.9038637 0.9069463 0.9098661 0.9126925
## [29] 0.9152774 0.9177672 0.9201859 0.9224303 0.9246450 0.9267792 0.9286603
## [36] 0.9304212 0.9321713 0.9338493 0.9354605 0.9370064 0.9384820 0.9399005
## [43] 0.9412789 0.9426322 0.9439237 0.9451499 0.9463496 0.9475123 0.9486630
## [50] 0.9497718 0.9508557 0.9519179 0.9529472 0.9539513 0.9549048 0.9558311
## [57] 0.9567263 0.9576004 0.9584558 0.9592755 0.9600771 0.9608612 0.9616328
## [64] 0.9623931 0.9631364 0.9638241 0.9645064 0.9651838 0.9658453 0.9664672
## [71] 0.9670817 0.9676880 0.9682927 0.9688794 0.9694526 0.9700124 0.9705553
## [78] 0.9710948 0.9716179 0.9721251 0.9726267 0.9731236 0.9736068 0.9740838
## [85] 0.9745505 0.9750057 0.9754508 0.9758757 0.9762957 0.9767066 0.9771151
## [92] 0.9775173 0.9779133 0.9782967 0.9786721 0.9790414 0.9794035 0.9797594
## [99] 0.9801127 0.9804630 0.9808028 0.9811334 0.9814564 0.9817667 0.9820760
## [106] 0.9823747 0.9826686 0.9829537 0.9832315 0.9835082 0.9837785 0.9840467
## [113] 0.9843090 0.9845685 0.9848193 0.9850666 0.9853026 0.9855349 0.9857645
## [120] 0.9859926 0.9862141 0.9864306 0.9866463 0.9868562 0.9870650 0.9872713
## [127] 0.9874734 0.9876743 0.9878715 0.9880604 0.9882478 0.9884298 0.9886111
## [134] 0.9887921 0.9889682 0.9891398 0.9893087 0.9894752 0.9896397 0.9897975
## [141] 0.9899545 0.9901097 0.9902622 0.9904129 0.9905633 0.9907129 0.9908582
## [148] 0.9910011 0.9911416 0.9912818 0.9914179 0.9915510 0.9916809 0.9918088
## [155] 0.9919354 0.9920618 0.9921836 0.9923033 0.9924228 0.9925390 0.9926534
## [162] 0.9927666 0.9928780 0.9929866 0.9930933 0.9931977 0.9933017 0.9934040
## [169] 0.9935035 0.9936022 0.9936986 0.9937934 0.9938876 0.9939791 0.9940692
## [176] 0.9941571 0.9942445 0.9943312 0.9944176 0.9945003 0.9945814 0.9946621
## [183] 0.9947420 0.9948205 0.9948982 0.9949751 0.9950508 0.9951261 0.9952003
## [190] 0.9952738 0.9953451 0.9954158 0.9954856 0.9955548 0.9956229 0.9956905
## [197] 0.9957579 0.9958238 0.9958889 0.9959532 0.9960163 0.9960772 0.9961379
## [204] 0.9961978 0.9962573 0.9963165 0.9963748 0.9964322 0.9964882 0.9965435
## [211] 0.9965985 0.9966518 0.9967049 0.9967571 0.9968076 0.9968575 0.9969068
## [218] 0.9969554 0.9970033 0.9970508 0.9970971 0.9971433 0.9971891 0.9972337
## [225] 0.9972780 0.9973209 0.9973632 0.9974050 0.9974467 0.9974874 0.9975277
## [232] 0.9975673 0.9976062 0.9976445 0.9976823 0.9977195 0.9977564 0.9977930
## [239] 0.9978293 0.9978650 0.9978997 0.9979341 0.9979677 0.9980010 0.9980335
## [246] 0.9980653 0.9980969 0.9981284 0.9981590 0.9981892 0.9982194 0.9982492
## [253] 0.9982785 0.9983073 0.9983358 0.9983640 0.9983915 0.9984188 0.9984459
## [260] 0.9984726 0.9984988 0.9985246 0.9985498 0.9985748 0.9985994 0.9986233
## [267] 0.9986471 0.9986707 0.9986941 0.9987169 0.9987395 0.9987615 0.9987832
## [274] 0.9988049 0.9988259 0.9988467 0.9988672 0.9988873 0.9989072 0.9989267
## [281] 0.9989458 0.9989647 0.9989834 0.9990019 0.9990203 0.9990384 0.9990562
## [288] 0.9990739 0.9990911 0.9991079 0.9991246 0.9991411 0.9991573 0.9991734
## [295] 0.9991890 0.9992044 0.9992196 0.9992343 0.9992489 0.9992634 0.9992775
## [302] 0.9992916 0.9993054 0.9993189 0.9993321 0.9993453 0.9993582 0.9993709
## [309] 0.9993834 0.9993958 0.9994080 0.9994202 0.9994320 0.9994438 0.9994553
## [316] 0.9994665 0.9994775 0.9994884 0.9994992 0.9995099 0.9995202 0.9995306
## [323] 0.9995406 0.9995504 0.9995600 0.9995696 0.9995788 0.9995878 0.9995967
## [330] 0.9996056 0.9996144 0.9996230 0.9996314 0.9996396 0.9996479 0.9996559
## [337] 0.9996637 0.9996714 0.9996789 0.9996862 0.9996934 0.9997005 0.9997075
## [344] 0.9997144 0.9997211 0.9997278 0.9997343 0.9997408 0.9997472 0.9997534
## [351] 0.9997593 0.9997652 0.9997710 0.9997767 0.9997823 0.9997877 0.9997931
## [358] 0.9997983 0.9998035 0.9998085 0.9998136 0.9998184 0.9998231 0.9998278
## [365] 0.9998324 0.9998368 0.9998412 0.9998454 0.9998497 0.9998538 0.9998579
## [372] 0.9998620 0.9998660 0.9998698 0.9998736 0.9998773 0.9998810 0.9998846
## [379] 0.9998881 0.9998916 0.9998950 0.9998983 0.9999016 0.9999048 0.9999079
## [386] 0.9999109 0.9999139 0.9999168 0.9999196 0.9999224 0.9999250 0.9999276
## [393] 0.9999302 0.9999327 0.9999352 0.9999376 0.9999399 0.9999422 0.9999444
## [400] 0.9999466 0.9999487 0.9999507 0.9999527 0.9999546 0.9999565 0.9999584
## [407] 0.9999601 0.9999618 0.9999635 0.9999651 0.9999667 0.9999682 0.9999697
## [414] 0.9999711 0.9999725 0.9999739 0.9999752 0.9999764 0.9999776 0.9999788
## [421] 0.9999799 0.9999810 0.9999821 0.9999832 0.9999842 0.9999851 0.9999861
## [428] 0.9999870 0.9999878 0.9999886 0.9999894 0.9999902 0.9999909 0.9999916
## [435] 0.9999923 0.9999929 0.9999935 0.9999941 0.9999946 0.9999951 0.9999955
## [442] 0.9999959 0.9999963 0.9999966 0.9999969 0.9999971 0.9999974 0.9999976
## [449] 0.9999977 0.9999979 0.9999981 0.9999982 0.9999983 0.9999984 0.9999985
## [456] 0.9999986 0.9999987 0.9999987 0.9999988 0.9999989 0.9999989 0.9999990
## [463] 0.9999990 0.9999991 0.9999991 0.9999992 0.9999992 0.9999993 0.9999993
## [470] 0.9999993 0.9999994 0.9999994 0.9999995 0.9999995 0.9999995 0.9999995
## [477] 0.9999996 0.9999996 0.9999996 0.9999996 0.9999997 0.9999997 0.9999997
## [484] 0.9999997 0.9999998 0.9999998 0.9999998 0.9999998 0.9999998 0.9999998
## [491] 0.9999999 0.9999999 0.9999999 0.9999999 0.9999999 0.9999999 0.9999999
## [498] 0.9999999 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## [505] 1.0000000
pr.out=img.out
W <- pr.out$rotation #the loading matrix
pc.image <- list()
num.pcs <- c(1, 5, 10, 15, 20, 30, 50, 100, 200)
#scale the original image
Image <- scale(img_red_only)
for(j in 1:length(num.pcs)){
u.proj <- W
#we will only use the first num.pcs PC loadings so set the remaining to 0
u.proj[, (num.pcs[j] + 1) : 396] <- 0
#Make the projection
projection <- (Image%*%u.proj)%*%t(u.proj)
#to draw an image, values need to be between 0 and 1
scaled <- (projection - min(as.numeric(projection)))
scaled <- scaled / max(as.numeric(scaled))
pc.image[[j]] <- as.raster(scaled)
}
#plot each of the images
grid.raster(pc.image[[1]])
cumsum(pve)[1]
## [1] 0.2962951
grid.raster(pc.image[[2]])
cumsum(pve)[5]
## [1] 0.6792437
grid.raster(pc.image[[3]])
cumsum(pve)[10]
## [1] 0.8038898
grid.raster(pc.image[[4]])
cumsum(pve)[15]
## [1] 0.8572818
grid.raster(pc.image[[5]])
cumsum(pve)[20]
## [1] 0.8853024
grid.raster(pc.image[[6]])
cumsum(pve)[30]
## [1] 0.9177672
grid.raster(pc.image[[7]])
cumsum(pve)[50]
## [1] 0.9497718
grid.raster(pc.image[[8]])
cumsum(pve)[100]
## [1] 0.980463
grid.raster(pc.image[[9]])
cumsum(pve)[200]
## [1] 0.9959532